The first question generally raised is, “what is a”hydrofabric”? To date, the term has been nebulous at best and has been used to describe data as narrow as a set of cartographic lines and network connectivity, all the way to encompass the entire spatial data architectures needed to map and model the flow of water and flood extents.
Here, a “Ngen Hydrofabric” will include the following four parts:
The blue line features (flowpaths) and per-flowline catchments features
flowline/catchment topologies
Attributes to support routing and runoff modeling
The software and data models to make the creation of these open, reproducible, and flexible.
In this breakdown, features define the computational elements for hydrology and hydraulic modeling; topologies link data together for space/time processing (modular elements that act as a whole); attributes provide the information for model execution (physics based, conceptual and ML/AI model formulations); and software and data models develop community standards, reproducibility, and flexibility to support analysis at scale.
A discritization for the land surface one wants to model is fundamental to all modeling tasks. Without it, distributed and lumped models have not way to apply the needed equations, model formulations or computer science applications. Therefore anyone who cares about the science and application of water resource modeling and forecasting should care about the underlying data that supports it.
The issue is that describing the earths surface - particularly at continental scales - is a tricky task. Automated techniques can get us a long way in representation, however the modeling task at hand and local knowledge should be used in developing an authoritative product.
The aim of Ngen is to provide a framework in which heterogeneous models can be used to achieve the best results possible in the highest number of places. In the framework the hydrofabric is also a malleable product that can be modified to best support a specific application, assuming there is a consistent reference dataset in which it is built from. The centrality of a reference dataset cannot be understated. It not only allows people to avoid replication (and possible error) in the generation of hydrofabric, but it also provides a source in which all variant of the data can be linked together.
For example, there might be specific hydrofabric traits needed for runoff modeling, hydraulic routing, and inundation mapping. If all of these groups were to head out and built there own product from scratch, the ability to link them together to provide actionable information would take a a number of conflation processes that may or may not be easy, let alone feasible. Instead, if all groups were to start from a common reference system - and track the origins of that system - data conflation is simple. This mean time can be spent on the science rather then the integration.
As part of NOAA’s efforts to build Ngen, this reference fabric is being built.
Additionally, workflows for modify this reference data set to support scale dependent modeling tasks are in development. As such, in addition to a national reference fabric, NOAA will release a national modeling fabric that will be used in the initial implementation of the National Water Model built on Ngen.
So, can you build your own? For now all of these products are under heavy development but are not close to final. The anticipated goal for finishing these products in a beta release are fiscal year 2023.
Building a Ngen ready hydrofabric only requires one package that can be installed from Github with:
remotes::install_github("mikejohnson51/hyAggregate")The goal of hyAggregate is to develop aggregated hydrologic and hydraulic networks to inform modeling tasks reliant on uniform length (flowpath) and area (catchment) distributions.
In this document we illustrate the tool set for working with the established reference fabric to create Ngen ready data products. We focus on VPU 01 (east coast USA), as it is the most heavily tested. The workflow described here will work on any reference fabric artifact once they are produced, validated and published to ScienceBase.
The following steps walk you through the concepts and tools for building a Ngen ready dataset, what the outputs look like, and how you might interact with them. hyAggregate is part of a larger family of projects and packages aiming to support federal (USGS/NOAA) water modeling efforts. The whole suite of development tools can be installed with:
remotes::install_github("NOAA-OWP/hydrofabric")Attaching this library, similar to the tidyverse, installs and loads a canon of software designed to manipulate, modify, describe, process, and quantify hydrologic networks and land surface attributes:
library(hydrofabric)| Repo | Purpose |
|---|---|
| USGS-R/nhdplusTools | Tools for for network manipulation |
| dblodgett-usgs/hyRefactor | Tools for network refactoring |
| mikejohnson51/hyAggregate | Tools for distribution based network aggregation and Ngen file creation |
| mikejohnson51/opendap.catalog | Tools for for accessing remote data resources for parameter and attributes estimation |
| mikejohnson51/zonal | Tools for rapid catchment parameter summarization |
hyAggregate relies on data products generated as part of the Geospatial Fabric for National Hydrologic Modeling, version 2.0 project, the general outline of which, can be seen below:
Figure 1
In the first row of Figure 1, there are three (3) reference products.
An updated network attributes table that provides attributes for the network features in the data model of NHDPlusV2, but with substantial improvements based on contributions from the USGS, NOAA OWP, NCAR and others.
A set of reference catchment geometries in which geometric errors and artifacts in the NHDPlus CatchmentSP layer are corrected/removed.
A set of reference flowline geometries where headwater flowlines have been replaced with the NHDPlus Burn lines
The CONUS reference files for these datasets can be downloaded here respectively (attributes, catchments, flowpaths)
In the second row of Figure 1, the reference products are refactored based on a minimum flowpath criterion. This process is facilitated by the hyRefactor2 R package. The concept of refactoring includes (1) Splitting large or long catchments in the reference data to create a more uniform catchment size distribution and (2) collapsing catchment topology by removing very small inter confluence, merge very small headwaters. The goal of refactoring is to NOT reduce the fidelity of the network, but instead to move the network to a more uniform/coherent version of the network.
Refactoring is a parameter-based workflow, the selection of these parameters does impact the resulting network, for the National Scale fabric
| Parameter | Purpose | Elected Value |
|---|---|---|
| split_flines_meters | the maximum length flowpath desired in the output. | 10,000 |
| collapse_flines_meters | the minimum length of inter-confluence flowpath desired in the output. | 1,000 |
| collapse_flines_main_meters | the minimum length of between-confluence flowpaths. | 1,000 |
The refactored output is shared under the Refactored Parent item of the above ScienceBase resource (available here)
::: {#hello .blackbox .left} NOTE:
There is a difference between the use of flowline and flowpath. Per the hy_features standard, a catchment can have one primary flowpath (1:1) but multiple flowlines (1:many).
There is also a difference in the concept of a catchment, which is the holistic unit of hydrology, and a catchment divide which is the edge bounded by an inflown and outflow node
For the development of the reference fabric and modeling task outputs, we seek to define a set of divides and corresponding flowpaths from the reference flowlines and catchments. :::
All reference, refactored and Ngen hydrofabric archives live on ScienceBase. They can be accessed with the web interface or can be downloaded programatically.
The hyAggregate::get_reference_fabric() utility will download the most current geofabric for a Vector Processing Unit (VPU). Options include downloading the “refactored” (default) or “reference” data. If the requested file already exists the file path will be returned. Here we can find the local path to the reference fabric for VPU=01 in the ./data directory, and explore the layers contained within:
VPU = "01"
ref_file = get_reference_fabric(VPU = VPU, type = "reference", dir = "data")
st_layers(ref_file)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 reference_flowline Line String 65080 31
#> 2 reference_catchment Polygon 65968 6
#> 3 WB Multi Polygon 85 21
#> 4 POIs Point 3185 12Within the reference fabric artifacts - the reference_flowline and reference_catchment layers are those associated with the reference data (row 1 in figure 1). (NOTE: In some artifacts, these are called nhd_flowlines and nhd_catchments per this issue)
We can also request the data products associated with the national refactor:
file = get_reference_fabric(VPU = VPU, type = "refactored", dir = "data")
st_layers(file)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 refactored_flowpaths Line String 40195 7
#> 2 refactored_divides Polygon 40195 4
#> 3 mapped_POIs Point 3372 14
#> 4 lookup_table NA 63717 4Within the refactored fabric artifacts - the refactored_flowpaths and refactored_divides layers are the output of the refactoring process (row 2 in figure 1). The remaining layers are those central to the USGS gfv2.0 modeling task (row 3 in figure 1). (NOTE: In some artifacts, these are called reconciled and divides per this issue)
The NOAA Next Generation Water Resource Modeling Framework (Ngen) is a specific modeling task. hyAggregate houses the workflow(s) for generating the needed output for the NOAA NextGen modeling task, that starts from the refactored_flowpaths and refactored_divides layers.
It operates under the assumption that there is no “one model to rule them all” and that different model formulations (e.g. topmodel, wrf-hydro, LSTM) will work better in different locations. Formulations are then modules that can execute at the scale of the catchment. The aim of the hydrofabric is to encapsulate the higher level notion of a catchment (see the HY_features specification) such that a variety of hydraulic and hydrologic models can execute and exchange information.
Because of the notion is to model the runoff process in a heterogeneous way, there is an interest in trying to align the scale of the catchment artifacts to the scale of the hydrologic processes being simulated. The scale identified for this initial run is the 3 - 15 square kilometers, with an ideal size of 10 sqkm.
As seen below, this is not even close to the area/length distributions found in either the reference or refactored artifacts.
To overcome the mismatched scale in the reference/reference fabric and the Ngen modelling task, we need to aggregate the catchments to a defined user threshold/distribution. Therefore, Ngen is a distribution based modeling task that seeks to align the scale of process to scale of catchment representation.
hyAggregate allows users to prescribe an ideal catchment size, a minimum catchment area, and a minimum flowpath length. The default parameters used for Ngen are an ideal catchment size of 10 km², a minimum catchment size of 3 km², and a minimum flowpath length of 1 km. A refactored_fabric can be aggregated to this distribution using hyAggregate:
ngen_v1 = aggregate_network_to_distribution(gf = get_reference_fabric(VPU = VPU, dir = "data"),
outfile = glue("data/01_ngen_{VPU}.gpkg"),
nexus_topology = FALSE)
#> Warning: data/01_ngen_01.gpkg already exists and overwrite is FALSE
st_layers(ngen_v1)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 aggregate_flowpaths 17739 10
#> 2 aggregate_divides 17739 2You’ll see that aggregating the network to this scale requires losing resolution of the network. In its current iteration Ngen requires a 1:1 flowpath to catchment relationship. In time the ability to support 1:many relationship will be developed so that the flowpath and/or catchment network can be densified in areas where it is needed. Until then, the 1:1 requirement mean that for each catchment a primary mainstem must be elected, and it is this mainstem only that is reflected in the final aggregated product.
The outputs of the base hyAggregate function create a geopackage with a set of catchment and flowpath features. Below we see the desired distribution has been enforced on the resulting hydrofabric. A more complete discussion of what happens within the aggregation can be found in the code and will be documented latter on.
Geopackages are spatial databases that can be used with many software suites, as well as programming languages that call GDAL (e.g. R, python, Rust). Once read in, the data layers can be mapped with base products, packages (In R, sf, ggplot2, leaflet, mapview – In python fiona, geopandas, geoplot, matlibplot, etc).
In R, here is are a few basic examples:
agg = read_sf(ngen_v1, "aggregate_flowpaths")
plot(agg$geom)
#Define area and project to same Coordinate Reference System
AOI = AOI::aoi_get("Boston") |>
st_transform(st_crs(agg))
boston_flow = st_intersection(agg, AOI)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
plot(boston_flow$geom)
plot(AOI$geometry, add = TRUE)mapview::mapview(boston_flow)The flowpath/divides produced in the aggregation provide the features that discritize the landscape and river network into the computational elements that will be used. The next construct is that we need a description of how these features are connected and which
To date, the NHDPlus topology (and many other flow network models) rely on flowpath-to-flowpath connectivity that describes the network in terms of how flowpaths connect to other flowpaths (therefore, which divides connect to other divides). The reference and refactored products retain this ‘flowline-to-flowline’ connectivity as does the ngen_aggregation when nexus_topology=FALSE.
topology = read_sf(ngen_v1, 'aggregate_flowpaths') |>
st_drop_geometry() |>
select(id, toid)
head(topology)
#> # A tibble: 6 × 2
#> id toid
#> <int> <dbl>
#> 1 25946 25944
#> 2 25944 25939
#> 3 25943 25942
#> 4 25942 25939
#> 5 25940 25939
#> 6 25941 25939Ngen is moving away from ‘flowline-to-flowline’ model towards ‘area-to-point’ model. Here the idea of ‘area’ is wrapped up into the holistic idea of a HY_feature catchment and ‘point’ is captured by the conceptual idea of a hydro nexus.
A physiographic unit where hydrologic processes take place. This class denotes a physiographic unit, which is defined by a hydrologically determined outlet to which all waters flow. While a catchment exists, it may or may not be clearly identified for repeated study. This approach considers the catchment concept to be the basic unit of study in hydrology, water resources management, and environmental reporting. The approach is meant to be holistic, referring to the continuous interaction of surface and subsurface waters within a catchment, even if a particular representation of the catchment refers to only surface or subsurface aspects of the catchment.
Conceptual outlet for water contained by a catchment. The hydro nexus concept represents the place where a catchment interacts with another catchment. Every catchment flows to a hydro nexus, conversely every location in a hydrologic system can be thought of as a hydro nexus that drains some catchment. Similar to catchments, hydro nexuses can be realized in several hydrology-specific ways.
If a given hydro nexus does not have a known hydrology-specific realization or is undetermined, it is termed ‘nillable’ in this standard. For example, a hydro nexus exists in the form of flow to the subsurface or atmosphere but may be undetermined and unrepresented within implementations focused on surface water hydrology and would not be included or referenced.
We are doing this to be more compliant with the HY_Features conceptual model for representing surface water components and to ultimately help develop the community data model needed for implementing the hy_feature concepts. This type of flow model was needed due to the scale of aggregation Ngen requires, and the
The long term advantage of this approach is the ability to use multi-scale networks, leverage community efforts in hydrologic addressing and representation, and to have a more defined computational element for Ngen comparability.
In practice, this yields a scenario like below, where catchments may not flow to the coincident catchment but sometimes to the downstream nexus. This is highlighted where ‘cat-4’ not longer flows to ‘cat-3’ but instead to ‘nex-2’. At the same time, ‘cat-3’ is also flowing to ‘nex-2’ suggesting two areas (holistic catchments) are flowing to one point (a hydro nexus).
There are two principle types of nexus locations, those that are implied by the network, and those that occur at locations of interest. So far, Ngen only works on the first. To generate the ‘nexus_topology’ (opposed to flowline) and materialize the implied nexus locations, we can set the nexus_topology=TRUE and rerun the aggregation code.
ngen_v2 = aggregate_network_to_distribution(
gf = get_reference_fabric(VPU = VPU, dir = "data"),
outfile = glue("data/02_ngen_{VPU}.gpkg"))
#> Warning: data/02_ngen_01.gpkg already exists and overwrite is FALSE
st_layers(ngen_v2)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 aggregate_flowpaths Multi Line String 17739 9
#> 2 aggregate_divides 17739 3
#> 3 nexus Point 8121 2
#> 4 flowpath_edge_list NA 25864 2This option produces a non-spatial “flowpath_edge_list” table layer within the geopackage. Opening this table highlights the area to point topology where the area (holistic catchment) is realized as the waterbody (‘wb-’) of the flowpath.
head(read_sf(ngen_v2, 'flowpath_edge_list'))
#> # A tibble: 6 × 2
#> id toid
#> <chr> <chr>
#> 1 wb-24713 nex-24401
#> 2 wb-25938 tnex-1000000001
#> 3 wb-25949 tnex-1000000002
#> 4 wb-25557 tnex-1000000003
#> 5 wb-23371 tnex-1000000004
#> 6 wb-23336 tnex-1000000005The choice to use the waterbody realization, is tied to the need for network topology to support hydrologic and hydraulic routing.
The core of the Ngen modeling system uses t-route for hydrologic and hydraulic routing. t-route was designed to support the existing NWM and as such relies on routing parameters defined in the NWM Routelink file.
The get_routelink_path() function will search your hyAggregate R directory for a fst representation of the most current routelink file. If this file does not exist, then the code will (1) Check for the latest version of the nwm domain files on ncep. (2) Download the NetCDF file (3) extract it into a fst file shown below, and (4) save it to the specified directory.
path = get_routelink_path()
fst::metadata_fst(path)
#> <fst file>
#> 2776738 rows, 20 columns (RouteLink_nwm_v2_1_6.fst)
#>
#> * 'comid' : double
#> * 'NHDWaterbodyComID': double
#> * 'gages' : character
#> * 'from' : double
#> * 'to' : double
#> * 'alt' : double
#> * 'order' : double
#> * 'Qi' : double
#> * 'MusK' : double
#> * 'MusX' : double
#> * 'Length' : double
#> * 'n' : double
#> * 'So' : double
#> * 'ChSlp' : double
#> * 'BtmWdth' : double
#> * 'time' : double
#> * 'Kchan' : double
#> * 'nCC' : double
#> * 'TopWdthCC' : double
#> * 'TopWdth' : doubleBy specifying a routelink_path, aggregate_network_to_distribution knows to develop length-averaged routelink parameter and associations (gages and NHDWaterbodies). The length averaging is done by determining the proportion of each reference flowline (COMID) that falls within each aggregated catchment. The associated values are then weighted by the relative length:
ngen_v3 = aggregate_network_to_distribution(
get_reference_fabric(VPU = VPU, dir = "data"),
outfile = glue("data/03_ngen_{VPU}.gpkg"),
nexus_topology = TRUE,
routelink_path = get_routelink_path())
#> Warning: data/03_ngen_01.gpkg already exists and overwrite is FALSE
st_layers(ngen_v3)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 aggregate_flowpaths Multi Line String 17739 9
#> 2 aggregate_divides 17739 3
#> 3 nexus Point 8121 2
#> 4 flowpath_edge_list NA 25864 2
#> 5 flowpath_attributes NA 17739 16
head(read_sf(ngen_v3, 'flowpath_attributes'))
#> # A tibble: 6 × 16
#> id gages NHDWaterbodyCom… Qi MusK MusX n So ChSlp BtmWdth
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 wb-1 <NA> <NA> NA NA NA NA NA NA NA
#> 2 wb-10 <NA> <NA> 0 3600 0.2 0.05 1.28e-5 0.178 42.8
#> 3 wb-1… <NA> <NA> 0 3600 0.2 0.0503 4.53e-4 0.245 20.6
#> 4 wb-1… <NA> <NA> 0 3600 0.2 0.06 1.86e-2 0.494 4.20
#> 5 wb-1… <NA> <NA> 0 3600 0.2 0.06 8.38e-2 0.600 2.71
#> 6 wb-1… <NA> <NA> 0 3600 0.2 0.06 7.61e-2 0.577 2.97
#> # … with 6 more variables: time <dbl>, Kchan <dbl>, nCC <dbl>, TopWdthCC <dbl>,
#> # TopWdth <dbl>, length_m <dbl>While routing parameters and topologies are critical for modeling the flow of water, runoff formulations are critical for simulating how much water will be in the channel.
The capacity for Ngen to encapsulate the idea of a catchment and implement a suite of different model formulations requires that critical attribute information is available to inform the wide range of models.
Since Ngen runs runoff models at the catchment scale, and not at the gridded scale (at least as of now), there is a need to summarize gridded information to the catchment scale in a fast, flexible, and performant way.
This requires the ability to identify the quickly and accurately summarizes raster values over polygonal areas including partially covered grid cells. Once the divide/raster cell relationships are established, we must define the way in which those values are summarized (collapsed). For example, do we want the “max temperature per catchment” or the “mean temperature be catchment”?
For the initial Ngen modules (cfe and NOAH-OWP-modular) the following variables from the NWM gridded data are summarized using the following functions:
| Tables | Description | Layer(s) | Summary Function |
|---|---|---|---|
| bexp | Beta Parameter | 4 | mode |
| IVGTYP | Dominant category | 1 | mode |
| ISLTYP | Dominant category | 1 | mode |
| dksat | Saturated Soil Connectivity | 4 | geometric mean |
| psisat | Saturated soil matric potential | 4 | geometric mean |
| slope | Slope Index | 1 | mean |
| smcmax | Saturated value of soil moisture [volumetric] | 4 | mean |
| smcwlt | Wilting point soil moisture [volumetric] | 4 | mean |
| refkdt | Parameter in the surface runoff parameterization | 1 | mean |
| cwpvt | Empirical canopy wind parameter | 1 | mean |
| vcmx25 | Maximum rate of carboxylation at 25 C [ umol CO2/m2/s] | 1 | mean |
| mp | Slope of Conductance to photosynthesis relationship | 1 | mean |
| mfsno | Snowmelt m parameter | 1 | mean |
| Coef | Coefficient | 1 | mean |
| Zmax | Zmax | 1 | mean |
| Expon | Exponent | 1 | mode |
For convenience, these layers have been combined and stored as a single NetCDF on ScienceBase. The get get_nwm_grids function in hyAggregate will either return a file.path if it exists in the provided directory; download the cached file from ScienceBase (if cache_download = TRUE); or, build it from the files available on NCEP.
This summarization is wrapped in the function aggregate_cfe_noahowp, but can be reworked using zonal if more variables, or different summary functions are desired:
ngen_v4 = glue("data/04_ngen_{VPU}.gpkg")
#file.copy(ngen_v3, ngen_v4)ngen_v4 = aggregate_cfe_noahowp(gpkg = ngen_v4, dir = 'data', add_to_gpkg = TRUE)st_layers(ngen_v4)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 aggregate_flowpaths Multi Line String 17739 9
#> 2 aggregate_divides 17739 3
#> 3 nexus Point 8121 2
#> 4 flowpath_edge_list NA 25864 2
#> 5 flowpath_attributes NA 17739 16
#> 6 cfe_noahowp_attributes NA 17739 32
dplyr::glimpse(read_sf(ngen_v4, 'cfe_noahowp_attributes'))
#> Rows: 17,739
#> Columns: 32
#> $ id <chr> "cat-10", "cat-100", "cat-10000", "cat-100…
#> $ gw_Coeff <dbl> 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, …
#> $ gw_Zmax <dbl> 21.81905, 61.28427, 189.44409, 189.44409, …
#> $ gw_Expon <dbl> 7, 4, 7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, …
#> $ ISLTYP <int> 4, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4, 3, 3, …
#> $ IVGTYP <int> 14, 15, 15, 15, 15, 11, 18, 15, 11, 11, 15…
#> $ `bexp_soil_layers_stag=1` <dbl> 8.307416, 9.663363, 8.189735, 8.189735, 8.…
#> $ `bexp_soil_layers_stag=2` <dbl> 8.307416, 9.663363, 8.189735, 8.189735, 8.…
#> $ `bexp_soil_layers_stag=3` <dbl> 8.307416, 9.663363, 8.189735, 8.189735, 8.…
#> $ `bexp_soil_layers_stag=4` <dbl> 8.307416, 9.663363, 8.189735, 8.189735, 8.…
#> $ `dksat_soil_layers_stag=1` <dbl> 9.674763e-06, 6.814276e-05, 2.274828e-05, …
#> $ `dksat_soil_layers_stag=2` <dbl> 9.674763e-06, 6.814276e-05, 2.274828e-05, …
#> $ `dksat_soil_layers_stag=3` <dbl> 9.674763e-06, 6.814276e-05, 2.274828e-05, …
#> $ `dksat_soil_layers_stag=4` <dbl> 9.674763e-06, 6.814276e-05, 2.274828e-05, …
#> $ `psisat_soil_layers_stag=1` <dbl> 0.8526488, 2.9530284, 0.4077230, 0.4163978…
#> $ `psisat_soil_layers_stag=2` <dbl> 0.8526488, 2.9530284, 0.4077230, 0.4163978…
#> $ `psisat_soil_layers_stag=3` <dbl> 0.8526488, 2.9530284, 0.4077230, 0.4163978…
#> $ `psisat_soil_layers_stag=4` <dbl> 0.8526488, 2.9530284, 0.4077230, 0.4163978…
#> $ cwpvt <dbl> 0.35852596, 0.33221954, 0.32212493, 0.3221…
#> $ mfsno <dbl> 3.8449855, 0.7655500, 0.9985031, 0.9985031…
#> $ mp <dbl> 11.283833, 11.425405, 7.686481, 7.678552, …
#> $ refkdt <dbl> 3.9181907, 3.8679886, 0.7667319, 0.7667319…
#> $ slope <dbl> 0.27817425, 0.09440348, 0.22569042, 0.2256…
#> $ `smcmax_soil_layers_stag=1` <dbl> 0.4745846, 0.3771734, 0.4704441, 0.4864077…
#> $ `smcmax_soil_layers_stag=2` <dbl> 0.4745846, 0.3771734, 0.4704441, 0.4864077…
#> $ `smcmax_soil_layers_stag=3` <dbl> 0.4745846, 0.3771734, 0.4704441, 0.4864077…
#> $ `smcmax_soil_layers_stag=4` <dbl> 0.4745846, 0.3771734, 0.4704441, 0.4864077…
#> $ `smcwlt_soil_layers_stag=1` <dbl> 0.06958908, 0.07951985, 0.04514346, 0.0578…
#> $ `smcwlt_soil_layers_stag=2` <dbl> 0.06958908, 0.07951985, 0.04514346, 0.0578…
#> $ `smcwlt_soil_layers_stag=3` <dbl> 0.06958908, 0.07951985, 0.04514346, 0.0578…
#> $ `smcwlt_soil_layers_stag=4` <dbl> 0.06958908, 0.07951985, 0.04514346, 0.0578…
#> $ vcmx25 <dbl> 48.46579, 75.29526, 72.71961, 60.33285, 77…
tmp = left_join(read_sf(ngen_v4, "aggregate_divides"), read_sf(ngen_v4, "cfe_noahowp_attributes"), by = "id")
ggplot() +
geom_sf(data = tmp, aes(fill = gw_Zmax), color = NA) +
scale_fill_viridis_c()It is worth highlighting that the geopackage is a critical file format for sharing this type of data because it is (1) self containing (2) compact and (3) language agnostic.
There are two broad paths to accessing the data. The first is a GDAL based approach (in R this is using sf, but in python Fiona or geopandas works well).
# Read Options
pacman::p_load(sf, DBI, RSQLite, dplyr)
### 1. GDAL/geopackage
st_layers(ngen_v4)
#> Driver: GPKG
#> Available layers:
#> layer_name geometry_type features fields
#> 1 aggregate_flowpaths Multi Line String 17739 9
#> 2 aggregate_divides 17739 3
#> 3 nexus Point 8121 2
#> 4 flowpath_edge_list NA 25864 2
#> 5 flowpath_attributes NA 17739 16
#> 6 cfe_noahowp_attributes NA 17739 32
nex = read_sf(ngen_v4, "nexus")
head(nex)
#> Simple feature collection with 6 features and 2 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1977397 ymin: 2906044 xmax: 2084058 ymax: 2983234
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 6 × 3
#> id toid geom
#> <chr> <chr> <POINT [m]>
#> 1 nex-175 cat-175 (2040728 2933224)
#> 2 nex-333 cat-333 (1977397 2906044)
#> 3 nex-358 cat-358 (2063599 2983234)
#> 4 nex-366 cat-366 (1992103 2962212)
#> 5 nex-385 cat-385 (2024950 2917374)
#> 6 nex-407 cat-407 (2084058 2966035)The second is as a SQLite database (in R, I am doing this with RSQLite/DBI/dplyr):
### 2. SQLite/Database
db <- dbConnect(SQLite(), ngen_v4)
dbListTables(db)
#> [1] "aggregate_divides"
#> [2] "aggregate_flowpaths"
#> [3] "cfe_noahowp_attributes"
#> [4] "flowpath_attributes"
#> [5] "flowpath_edge_list"
#> [6] "gpkg_contents"
#> [7] "gpkg_extensions"
#> [8] "gpkg_geometry_columns"
#> [9] "gpkg_ogr_contents"
#> [10] "gpkg_spatial_ref_sys"
#> [11] "gpkg_tile_matrix"
#> [12] "gpkg_tile_matrix_set"
#> [13] "nexus"
#> [14] "rtree_aggregate_divides_geom"
#> [15] "rtree_aggregate_divides_geom_node"
#> [16] "rtree_aggregate_divides_geom_parent"
#> [17] "rtree_aggregate_divides_geom_rowid"
#> [18] "rtree_aggregate_flowpaths_geom"
#> [19] "rtree_aggregate_flowpaths_geom_node"
#> [20] "rtree_aggregate_flowpaths_geom_parent"
#> [21] "rtree_aggregate_flowpaths_geom_rowid"
#> [22] "rtree_nexus_geom"
#> [23] "rtree_nexus_geom_node"
#> [24] "rtree_nexus_geom_parent"
#> [25] "rtree_nexus_geom_rowid"
#> [26] "sqlite_sequence"
fpa = collect(tbl(db, "flowpath_attributes"))
head(fpa)
#> # A tibble: 6 × 17
#> fid id gages NHDWaterbodyCom… Qi MusK MusX n So ChSlp
#> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 wb-1 <NA> <NA> NA NA NA NA NA NA
#> 2 2 wb-10 <NA> <NA> 0 3600 0.2 0.05 1.28e-5 0.178
#> 3 3 wb-100 <NA> <NA> 0 3600 0.2 0.0503 4.53e-4 0.245
#> 4 4 wb-100… <NA> <NA> 0 3600 0.2 0.06 1.86e-2 0.494
#> 5 5 wb-100… <NA> <NA> 0 3600 0.2 0.06 8.38e-2 0.600
#> 6 6 wb-100… <NA> <NA> 0 3600 0.2 0.06 7.61e-2 0.577
#> # … with 7 more variables: BtmWdth <dbl>, time <dbl>, Kchan <dbl>, nCC <dbl>,
#> # TopWdthCC <dbl>, TopWdth <dbl>, length_m <dbl>
dbDisconnect(db)The geopackage serves as a concise, interoperaple and efficient format for distributing packaged data that is tabular and spatial. However it does not meet everyone’s need. In particular ngen operates using JSON/geoJSON files. Similarly the ESRI family of tools (e.g. ArcMap/ArcPro/ArcGIS Online) cannot read geopackges.
As such, we offer a final function that generated the ngen files, and optionally shapefiles, from a GPKG.
write_ngen_dir(ngen_v4, export_shapefiles = TRUE)This function creates a directory at the same level and named as the GPKG and outputs the following files:
fs::dir_tree(glue("data/04_ngen_{VPU}"))
#> data/04_ngen_01
#> ├── catchment_data.geojson
#> ├── crosswalk-mapping.json
#> ├── flowpath_edge_list.json
#> ├── flowpath_params.json
#> ├── nexus_data.geojson
#> └── shps
#> ├── aggregate_divides.dbf
#> ├── aggregate_divides.prj
#> ├── aggregate_divides.shp
#> ├── aggregate_divides.shx
#> ├── aggregate_flowpaths.dbf
#> ├── aggregate_flowpaths.prj
#> ├── aggregate_flowpaths.shp
#> ├── aggregate_flowpaths.shx
#> ├── cfe_noahowp_attributes.dbf
#> ├── flowpath_attributes.dbf
#> ├── flowpath_edge_list.dbf
#> ├── nexus.dbf
#> ├── nexus.prj
#> ├── nexus.shp
#> └── nexus.shxIf you have made it this far - thanks! A small detail you might notice is that in the data/ngen_01 directory created from the GPKG, there is a new file called crosswalk-mapping.json
data disambiguation, main_id = mainstem identifier, gage = NWIS siteID, member_comids = NHDPlus (e.g. reference flowlines).
library(jsonlite)
#> Warning: package 'jsonlite' was built under R version 4.1.2
cw = fromJSON('data/04_ngen_01/crosswalk-mapping.json')
# File with gage!
cw$`wb-16354`
#> $member_comids
#> [1] 6782185 6782183 6782267 6782179
#>
#> $outlet_comid
#> [1] 6782185
#>
#> $main_id
#> [1] 24952
# Feature w/o gage
cw$`wb-22117`
#> $member_comids
#> [1] 5867677 5867675
#>
#> $gages
#> [1] "01103040"
#>
#> $outlet_comid
#> [1] 5867677
#>
#> $main_id
#> [1] 33861So the last step here is a non-function-ized way to use these files in conjunction to subset the network based on the flowpaths that flow to a USGS gage.
In the above code chunk, we saw that wb-16354 was associated with NWIS:01172010. To find the network upstream of that gage, we can use the topology (edge list) and the determined identifier to ‘sort’ the network. The returned results is a sorted set if IDs upstream of the provided outlet. These IDs can then be used to extract the flowpath or divide network.
library(nhdplusTools)
#> USGS Support Package: https://owi.usgs.gov/R/packages.html#support
system.time({
trace = get_sorted(read_sf(ngen_v4, 'flowpath_edge_list'),
split = TRUE,
outlets = 'wb-16354')
flowpaths = filter(read_sf(ngen_v4, 'aggregate_flowpaths'), id %in% trace$id)
})
#> user system elapsed
#> 0.431 0.031 0.474
plot(st_geometry(flowpaths), main = paste(nrow(flowpaths), "flowpaths upstream of 'wb-16354'"))Lynker, jjohnson@lynker.com↩︎
Blodgett, D., Johnson, J.M., 2022, hyRefactor: Tools for Manipulating the NHDPlus Network in Preparation for Hydrologic Modeling↩︎